Feature Selection and Ensemble Learning Techniques in One-Class Classifiers: An Empirical Study of Two-Class Imbalanced Datasets

نویسندگان

چکیده

Class imbalance learning is an important research problem in data mining and machine learning. Most solutions including levels, algorithm cost sensitive approaches are derived using multi-class classifiers, depending on the number of classes to be classified. One-class classification (OCC) techniques, contrast, have been widely used for anomaly or outlier detection where only normal positive class training available. In this study, we treat every two-class imbalanced dataset as problem, which contains a larger majority class, i.e. very small minority class. The objectives paper understand performance OCC classifiers examine level improvement when feature selection considered pre-processing ensemble employed combine multiple classifiers. Based 55 datasets with different ranges ratios one-class support vector machine, isolation forest, local factor representative found that good at high ratio datasets, outperforming C4.5 baseline. most cases, though, performing does not improve most. However, many homogeneous heterogeneous classifier ensembles do outperform single some specific combinations both without selection, similar better than baseline combination SMOTE C4.5.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets

Imbalanced classification is related to those problems that have an uneven distribution among classes. In addition to the former, when instances are located into the overlapped areas, the correct modeling of the problem becomes harder. Current solutions for both issues are often focused on the binary case study, as multi-class datasets require an additional effort to be addressed. In this resea...

متن کامل

A Cost-Sensitive Ensemble Method for Class-Imbalanced Datasets

and Applied Analysis 3 costs for the positive and negative classes, SVM can be extended to the cost-sensitive setting by introducing an additional parameter that penalizes the errors asymmetrically. Consider that we have a binary classification problem, which is represented by a data set {(x 1 , y 1 ), (x 2 , y 2 ), . . . , (x l , y l )}, where x i ⊂ R represents a k-dimensional data point and ...

متن کامل

Class-based Aggressive Feature Selection for Polynomial Networks Text Classifiers – an Empirical Study

Feature Selection (FS) is a crucial preprocessing step in Text Classification (TC) systems. FS can be either Class-Based or Corpus-Based. Polynomial Network (PN) classifiers have proved recently to be competitive in TC using a very small subset of corpora features. This paper presents an empirical study of the performance of PN classifiers using Aggressive Class-Based FS. Seven of the stateof-t...

متن کامل

Feature Selection using Distributed Ensemble Classifiers for Very Large Datasets

Datasets are becoming larger and there is an acute need to use data mining techniques to exploit the available data. The increasing size of the datasets poses a challenge to the data miners, which can be solved using two approaches – high speed computing systems, and pre-processing techniques. In this paper, we propose a solution combining the above two techniques using a distributed feature se...

متن کامل

Robustness of learning techniques in handling class noise in imbalanced datasets

Many real world datasets exhibit skewed class distributions in which almost all instances are allotted to a class and far fewer instances to a smaller, but more interesting class. A classifier induced from an imbalanced dataset has a low error rate for the majority class and an undesirable error rate for the minority class. Many research efforts have been made to deal with class noise but none ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2021

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2021.3051969